The purpose of this script is to compute a measure of linearity during reading of the short story.
Linear navigation was defined as movement forwards in the text without skipping any pages. Nonlinear navigation, on the other hand, was defined as a regression or a forward leap according to the following:
Browsing was used in the definition of nonlinearity because the intention behind browsing is to move between locations in the text rather than read. Browsing speed was defined as less than one second used on a page-view. One second would not be enough time to read the text on a page, but it would only allow participants to get a gist of the information.
To measure frequency of nonlinear navigation, we calculate how often a nonnavigation event is initiated. A nonlinear is initiated if a nonlinear event occurs following a linear navigation event, or a nonlinear event to a different direction (e.g. forward leap following a regression) or executed via a different method (e.g. a regression by using the progress bar following a regression by turning pages backwards). Simply put, previous event’s (k-1) linearity, method of nonlinear navigation or direction of nonlinear navigation, does not correspond with the current event’s (k) linearity, method or direction of nonlinear navigation, and the current event is nonlinear. This measure therefore reflects the frequency of initiating nonlinear navigation during reading of the story.
Frequency of initiating nonlinearity was used as a measure of linearity of reading instead of linearity categories (linear or nonnavigation, regression, or forward leap) to make sure that each navigation event is only counted once. Using the linearity category would inflate the amount of nonlinear navigation if it is used across multiple pages (e.g. a regression that includes the participant going multiple pages backwards by page turns).
This script uses a dataframe that was wrangled in Prep_TrackingDataWrangling.Rmd.
The working directory is not changed with setwd() because this script is knit remotely in other scripts.
if (exists("ExternalAnalysisFilePath")) {
# ExternalAnalysisFilePath: ~/Short_Story_Reading_Behaviour_Public/
mypath_SSRB <- ExternalAnalysisFilePath
} else if (grepl("Prep", getwd())) {
mypath_SSRB <- dirname(getwd())
} else if (grepl("Short_Story_Reading_Behaviour_Public", getwd())) {
mypath_SSRB <- getwd()
} else {
# get working directory manually
mypath_SSRB <- paste0(
dirname(getwd()),
"/Short_Story_Reading_Behaviour_Public"
)
}
library(tidyverse)
library(tidyr)
library(dplyr)
library(plotly) # interactive plots
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
getwd() # working directory should be ~/Short_Story_Reading_Behaviour_Public
## [1] "C:/Users/Pauliina/Documents/GITHUB/Short_Story_Reading_Behaviour_Public/Prep"
To determine linearity, we use grouped_tracking_data that was created in Prep_TrackingDataWrangling.Rmd
grouped_tracking_data <-
read.csv(
paste0(
mypath_SSRB,
"/Data/wrangled_grouped_tracking_data.csv"
),
header = TRUE,
sep = ";",
dec = ","
)
grouped_tracking_data <- dplyr::select(grouped_tracking_data, -X, -X.1)
str(grouped_tracking_data)
## 'data.frame': 3175 obs. of 96 variables:
## $ StoryId : int 7 7 7 7 7 7 7 7 7 7 ...
## $ UserId : int 6 6 6 6 6 6 6 6 6 6 ...
## $ ReadingSessionNumber : int 4 4 4 4 4 4 4 4 4 4 ...
## $ EngagementTypeId : int 1 2 2 2 2 2 3 4 4 4 ...
## $ Id : int 7784 7785 7788 7789 7819 7821 7850 7851 7852 7853 ...
## $ NavigationBlockNumber : int 27 28 29 30 31 32 33 34 35 36 ...
## $ BaselineSpeed : num 320 320 320 320 320 ...
## $ IsBaselineSpeedAdjusted : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ AdjustedBaselineSpeed : num NA NA NA NA NA NA NA NA NA NA ...
## $ IsIntrinsicCondition : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ Date : chr "2020-05-16" "2020-05-16" "2020-05-16" "2020-05-16" ...
## $ Time : chr "2022-09-29 11:03:13.290" "2022-09-29 11:03:14.109" "2022-09-29 11:03:21.053" "2022-09-29 11:03:21.409" ...
## $ TimeBeforeDeadlinesDays : num 3.99 3.99 3.99 3.99 3.98 ...
## $ TimeBeforeDeadlineMinutes : num 5741 5741 5741 5741 5728 ...
## $ Type : chr "openBook" "openPage" "keyboardBackward" "openPage" ...
## $ StartLocation : int 0 0 861 0 16134 17126 31630 31630 31630 26583 ...
## $ VisibleCharacterCount : int 0 861 1032 861 992 1083 733 733 733 1074 ...
## $ VisibleWordCount : int 0 179 240 179 243 239 172 172 172 244 ...
## $ PageInSection : int 0 1 2 1 16 17 31 31 31 26 ...
## $ TotalPagesInSection : int 0 31 31 31 31 31 31 31 31 31 ...
## $ Device : chr "Other" "Other" "Other" "Other" ...
## $ OperatingSystem : chr "Windows" "Windows" "Windows" "Windows" ...
## $ Browser : chr "Chrome" "Chrome" "Chrome" "Chrome" ...
## $ ReadingBlockNumber : int 8 9 10 11 26 27 41 41 41 42 ...
## $ IsBlurred : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsDialogOpen : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsMenuOpen : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsInactive : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsReading : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ TimezoneOffset : int -60 -60 -60 -60 -60 -60 -60 -60 -60 -60 ...
## $ WindowHeight : int 578 578 578 578 578 578 578 578 578 578 ...
## $ WindowWidth : int 1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 ...
## $ IsSelectionOpen : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsOpeningEvent : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ IsClosingEvent : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ IsPageOpen : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
## $ IsDurationFixed : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ ReadingBlockDuration : num 0.0137 0.1046 0.0171 0.6495 4.4955 ...
## $ EngagedReadingDuration : num 0 0.1046 0.0171 0.6495 4.4955 ...
## $ EngagedReadingSpeed : num NA 1711.8 14035.1 275.6 54.1 ...
## $ EngagedSpeedLabel : chr "" "Scanning" "Scanning" "DeepReading" ...
## $ Direction : chr "" "" "Backward" "" ...
## $ NavigationBlockDirection : chr "" "Forward" "Backward" "Forward" ...
## $ Condition : chr "NonAutonomousCondition" "NonAutonomousCondition" "NonAutonomousCondition" "NonAutonomousCondition" ...
## $ IsLastEventInReadingSession : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ DurationMinutes : num 0.01365 0.09843 0.00593 0.64272 4.4897 ...
## $ IsAdjustedDuration : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ NavigationBlockDuration : num 0.01365 0.11573 0.00593 12.67953 4.49552 ...
## $ ContinuousEngagementMinutes : num 0.0137 28.2973 28.2973 28.2973 28.2973 ...
## $ ContinuousEngagementSeconds : num 0.819 1697.837 1697.837 1697.837 1697.837 ...
## $ Engagement : chr "Disengagement" "Engagement" "Engagement" "Engagement" ...
## $ PotentialReadingSessionArtefact: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ EndLocation : int 0 861 1893 861 17126 18209 32363 32363 32363 27657 ...
## $ VisibleColumns : chr "TwoColumns" "TwoColumns" "TwoColumns" "TwoColumns" ...
## $ CumulativeRSTime : num 0.0137 0.1121 0.1353 0.778 17.3045 ...
## $ IsEngagement : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
## $ CumulativeEngagementTimeInRS : num 0 0.0984 0.1217 0.7644 17.2909 ...
## $ CumulativeEngagementTime : num 0 0.0984 0.1217 0.7644 17.2909 ...
## $ Author : chr "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" ...
## $ Title : chr "The Yates Pride" "The Yates Pride" "The Yates Pride" "The Yates Pride" ...
## $ AverageWordFrequency : num 4.07 4.07 4.07 4.07 4.07 ...
## $ SDWordFrequency : num 14.2 14.2 14.2 14.2 14.2 ...
## $ NumberOfWords : int 1458 1458 1458 1458 1458 1458 1458 1458 1458 1458 ...
## $ NumberOfSentences : int 461 461 461 461 461 461 461 461 461 461 ...
## $ AverageSentenceLength : num 15.4 15.4 15.4 15.4 15.4 ...
## $ SDSentenceLength : num 11.4 11.4 11.4 11.4 11.4 ...
## $ CharacterLength : int 32363 32363 32363 32363 32363 32363 32363 32363 32363 32363 ...
## $ WordLength : int 7253 7253 7253 7253 7253 7253 7253 7253 7253 7253 ...
## $ AverageRating : num 2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 ...
## $ SDRating : num 1.13 1.13 1.13 1.13 1.13 ...
## $ MedianRating : int 2 2 2 2 2 2 2 2 2 2 ...
## $ MinRating : int 1 1 1 1 1 1 1 1 1 1 ...
## $ MaxRating : int 5 5 5 5 5 5 5 5 5 5 ...
## $ PublicationYear : int 1912 1912 1912 1912 1912 1912 1912 1912 1912 1912 ...
## $ PublicationType : chr "PublicDomain" "PublicDomain" "PublicDomain" "PublicDomain" ...
## $ Genre : chr "Romance" "Romance" "Romance" "Romance" ...
## $ Percentage : num 0 0 0.0266 0 0.4985 ...
## $ NBFirstStartLocation : int 0 0 861 0 16134 17126 31630 31630 31630 26583 ...
## $ NBLastStartLocation : int 0 861 861 15327 16134 31630 31630 31630 31630 26583 ...
## $ NBFirstVisibleCharacterCount : int 0 861 1032 861 992 1083 733 733 733 1074 ...
## $ NBLastVisibleCharacterCount : int 0 1032 1032 807 992 733 733 733 733 1074 ...
## $ NBFirstReadingBlock : int 8 9 10 11 26 27 41 41 41 42 ...
## $ NBLastReadingBlock : int 8 10 10 25 26 41 41 41 41 42 ...
## $ NBFirstPage : int 0 1 2 1 16 17 31 31 31 26 ...
## $ NBLastPage : int 0 2 2 15 16 31 31 31 31 26 ...
## $ NBFirstPagesInSection : int 0 31 31 31 31 31 31 31 31 31 ...
## $ NBLastPagesInSection : int 0 31 31 31 31 31 31 31 31 31 ...
## $ NBEndLocation : int 0 1893 1893 16134 17126 32363 32363 32363 32363 27657 ...
## $ NBAverageReadingSpeed : num NA 5819.6 14035.1 285.1 54.1 ...
## $ NBFirstCumulativeRSTime : num 0.0137 0.1121 0.1353 0.778 17.3045 ...
## $ NBLastCumulativeRSTime : num 0.0137 0.1294 0.1353 12.8148 17.3104 ...
## $ NBDuration : num 0.01365 0.11573 0.00593 12.67953 4.49552 ...
## $ NBSpeedLabel : chr "" "Scanning" "Scanning" "DeepReading" ...
## $ NBAnyProgressBarUsage : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ NBFirstTimeBeforeDeadline : num 3.99 3.99 3.99 3.99 3.98 ...
## $ NBLastTimeBeforeDeadline : num 3.99 3.99 3.99 3.98 3.97 ...
source(
paste0(
mypath_SSRB,
"/Functions/Functions_VariableTypeConversion.R"
)
)
## turn user and story indicators into factors
grouped_tracking_data[, c(
"UserId",
"StoryId"
)] <- convert.magic(
grouped_tracking_data[, c(
"UserId",
"StoryId"
)],
"factor"
)
## turn reading block (page view) number,
### reading session number, and
#### navigation block number into ordered factors
grouped_tracking_data[, c(
"ReadingBlockNumber",
"ReadingSessionNumber",
"NavigationBlockNumber"
)] <- convert.magic(
grouped_tracking_data[, c(
"ReadingBlockNumber",
"ReadingSessionNumber",
"NavigationBlockNumber"
)],
"ordered"
)
## fix date and time variable types
grouped_tracking_data$Date <-
as.Date(
grouped_tracking_data$Date,
format = "%Y-%m-%d"
)
TempTimeObject <- strsplit(grouped_tracking_data$Time, " ")
TempTimeObject <- sapply(TempTimeObject, "[[", 2)
grouped_tracking_data$TempTimeObject <- TempTimeObject
grouped_tracking_data <- dplyr::select(grouped_tracking_data, -Time)
names(grouped_tracking_data)[which(colnames(grouped_tracking_data) == "TempTimeObject")] <- "Time"
grouped_tracking_data$DateTime <- paste(
grouped_tracking_data$Date,
grouped_tracking_data$Time
)
op <-
options(digits.secs = 3)
grouped_tracking_data$Time <-
strptime(
grouped_tracking_data$Time,
format = "%H:%M:%OS"
)
grouped_tracking_data <- grouped_tracking_data %>%
mutate(DateTime = as.POSIXct(DateTime, format = "%Y-%m-%d %H:%M:%OS"))
# order the df by User, date, and time
grouped_tracking_data <-
grouped_tracking_data[
with(
grouped_tracking_data,
order(UserId, Date, Time)
),
]
Next we determine how much the participants’ location shifts in between different navigation blocks. This essentially tells us how much, and to what direction, the participant moved in the text (if at all).
To count the number of pages turned, we first create variables that check whether consecutive navigation blocks (one per row) should be compared. We only want to compare navigation blocks from a) the same user b) the same reading session and c) the same reading set up (see below).
A reading set up affects the amount of ‘pages’ that the short story has, for example, in a small browser window the story is organised in more ‘pages’ than in a large browser window. Pages turned should only be determined between events that have the same reading set up. Note that this does not limit our ability to notice linearity of navigation as changing the reading set up will either result in an event that is in its own navigation block (which can then be compared to a subsequent navigation block; this happens e.g. when the participant resizes a browser window), or a change in reading set up starts a new reading session (e.g. when the participant changes device).
First, we create texts that will be useful in determining linearity:
grouped_tracking_data$IsNewUser <-
(
grouped_tracking_data$UserId !=
(lag(grouped_tracking_data$UserId, 1))
)
grouped_tracking_data$IsNewSession <-
(
grouped_tracking_data$ReadingSessionNumber !=
(lag(grouped_tracking_data$ReadingSessionNumber, 1))
)
grouped_tracking_data$IsReadingSetUpChange <-
(
grouped_tracking_data$NBFirstPagesInSection !=
(lag(grouped_tracking_data$NBFirstPagesInSection, 1))
)
# first event is TRUE for all 3 tests
grouped_tracking_data[1, "IsNewUser"] <- TRUE
grouped_tracking_data[1, "IsNewSession"] <- TRUE
grouped_tracking_data[1, "IsReadingSetUpChange"] <- TRUE
We can then calculate the amount of ‘pages’ turned between navigation blocks. The variable is calculated by subtracting a navigation block’s start location (in pages) from the next navigation block’s start location. If the reading set up, user, or reading session changes between the two navigation blocks and the navigation block includes movement, ‘PagesTurned’ is instead calculated by comparing the first page in a navigation block to the last page in the same navigation block.
# count the number of pages turned within a navigation block
grouped_tracking_data$PagesTurned <-
ifelse(
( # following navigation block is from
## same user,
## same reading session, and
## same reading set up
!lead(grouped_tracking_data$IsNewUser, 1) &
!lead(grouped_tracking_data$IsNewSession, 1) &
!lead(grouped_tracking_data$IsReadingSetUpChange, 1)
),
(
lead(grouped_tracking_data$NBFirstPage, 1) -
grouped_tracking_data$NBFirstPage
),
ifelse(
( # the navigation block includes movement but
## the navigation block following it has a
### different user,
### different reading session, or
### different reading set up
grouped_tracking_data$NBFirstPage !=
grouped_tracking_data$NBLastPage
),
(
grouped_tracking_data$NBLastPage -
grouped_tracking_data$NBFirstPage
),
# the navigation block doesn't include movement
0
)
)
# Manually set value for PagesTurned on last row of data
LastRow <- nrow(grouped_tracking_data)
grouped_tracking_data[LastRow, "PagesTurned"] <- (
grouped_tracking_data[LastRow, "NBLastPage"] -
grouped_tracking_data[LastRow, "NBFirstPage"]
)
# Check PagesTurned
## - regression, + forward leap or a chronological page turn, 0 not navigation
table(
sign(grouped_tracking_data$PagesTurned)
)
##
## -1 0 1
## 531 1489 1155
PagesTurned tells us the direction and extent of navigation. We use this information to assign each navigation event a label of ‘Regression’ or ‘ForwardLeap’.
Considering that regressions refer to any movement backwards in text, any navigation block with a negative value for PagesTurned can be labelled as a ‘Regression’ (PagesTurned < 0). Forward leaps include forward movement to a position further than the next page (PagesTurned > 1), either by turning pages at a browsing speed (NavigationBlockSpeedLabel == “Browsing”) or by using the progress bar (Type == progressBarJump).
grouped_tracking_data$IsRegression <-
(
grouped_tracking_data$PagesTurned < 0
)
table(grouped_tracking_data$IsRegression)
##
## FALSE TRUE
## 2644 531
grouped_tracking_data$IsForwardLeap <-
(
((grouped_tracking_data$PagesTurned > 1) &
(grouped_tracking_data$NBSpeedLabel == "Browsing")) |
((grouped_tracking_data$PagesTurned > 1) &
(grouped_tracking_data$NBAnyProgressBarUsage))
)
table(grouped_tracking_data$IsForwardLeap)
##
## FALSE TRUE
## 3103 72
The data includes 531 regressions and 72 forward leaps.
Our aim is to measure linearity on each page-view, and so we need to merge grouped_tracking_data with wrangled_tracking_data. First, we load in wrangled_tracking_data:
tracking_data <-
read.csv(
paste0(
mypath_SSRB,
"/Data/wrangled_tracking_data.csv"
),
header = TRUE,
sep = ";",
dec = ","
)
tracking_data <- dplyr::select(tracking_data, -X, -X.1)
## turn user and story indicators into factors
tracking_data[, c(
"UserId",
"StoryId"
)] <- convert.magic(
tracking_data[, c(
"UserId",
"StoryId"
)],
"factor"
)
## turn reading block (page view) number,
### reading session number, and
#### navigation block number into ordered factors
tracking_data[, c(
"ReadingBlockNumber",
"ReadingSessionNumber",
"NavigationBlockNumber"
)] <- convert.magic(
tracking_data[, c(
"ReadingBlockNumber",
"ReadingSessionNumber",
"NavigationBlockNumber"
)],
"ordered"
)
## fix date and time variable types
tracking_data$Date <-
as.Date(
tracking_data$Date,
format = "%Y-%m-%d"
)
TempTimeObject <- strsplit(tracking_data$Time, " ")
TempTimeObject <- sapply(TempTimeObject, "[[", 2)
tracking_data$TempTimeObject <- TempTimeObject
tracking_data <- dplyr::select(tracking_data, -Time)
names(tracking_data)[which(colnames(tracking_data) == "TempTimeObject")] <- "Time"
tracking_data$DateTime <- paste(
tracking_data$Date,
tracking_data$Time
)
op <-
options(digits.secs = 3)
tracking_data$Time <-
strptime(
tracking_data$Time,
format = "%H:%M:%OS"
)
tracking_data <- tracking_data %>%
mutate(DateTime = as.POSIXct(DateTime, format = "%Y-%m-%d %H:%M:%OS"))
Merge the dataframes together:
# columns selected from grouped_tracking_data:
## UserId, NavigationBlockNumber, ReadingSessionNumber
### IsRegression, IsForwardLeap, PagesTurned,
### NBFirstCumulativeRSTime, NBLastCumulativeRSTime, NBSpeedLabel
# Identify these columns first
column_UserId <- which(colnames(grouped_tracking_data) == "UserId")
column_NBNumber <- which(colnames(grouped_tracking_data) == "NavigationBlockNumber")
column_RSNumber <- which(colnames(grouped_tracking_data) == "ReadingSessionNumber")
column_IsRegression <- which(colnames(grouped_tracking_data) == "IsRegression")
column_IsForwardLeap <- which(colnames(grouped_tracking_data) == "IsForwardLeap")
column_PagesTurned <- which(colnames(grouped_tracking_data) == "PagesTurned")
column_NBFirstCumulativeRSTime <- which(colnames(grouped_tracking_data) == "NBFirstCumulativeRSTime")
column_NBLastCumulativeRSTime <- which(colnames(grouped_tracking_data) == "NBLastCumulativeRSTime")
column_NBSpeedLabel <- which(colnames(grouped_tracking_data) == "NBSpeedLabel")
# Merge dfs:
tracking_data <-
merge(
tracking_data,
grouped_tracking_data[, c(
column_UserId,
column_NBNumber,
column_RSNumber,
column_IsRegression,
column_IsForwardLeap,
column_PagesTurned,
column_NBFirstCumulativeRSTime,
column_NBLastCumulativeRSTime,
column_NBSpeedLabel
)],
by = c("UserId", "ReadingSessionNumber", "NavigationBlockNumber"),
all.x = TRUE
)
Next, we create columns that summarise information on nonlinearity based on IsRegression and IsForwardLeap (1) IsNonlinearNavigation tells us whether the event includes a regression or a forward leap (2) Linearity tells us what is the type of linearity (regression, forward leap, or linear/nonnavigation) (3) StartsNonlinearity tells us whether the event initiates nonlinear navigation following linear navigation, nonnavigation, or nonlinear navigation of a different type.
Calculate (1) IsNonlinearNavigation:
tracking_data$IsNonlinearNavigation <-
ifelse(
(tracking_data$IsRegression |
tracking_data$IsForwardLeap),
TRUE,
FALSE
)
table(tracking_data$IsNonlinearNavigation)
##
## FALSE TRUE
## 6424 2350
2350 of the 8774 events include nonlinear navigation of the text (26.78%).
Usage of nonlinear navigation varies between participants:
table(tracking_data$IsNonlinearNavigation, tracking_data$UserId)
##
## 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## FALSE 74 128 122 56 75 90 170 116 433 73 74 118 90 60 78 160 112 44
## TRUE 7 136 8 21 6 17 229 63 28 20 11 20 17 13 18 8 56 3
##
## 24 28 29 30 31 32 34 35 36 37 38 39 40 41 42 43 44 45
## FALSE 91 171 131 47 52 143 102 103 93 67 123 136 94 69 81 113 133 71
## TRUE 2 146 20 0 23 19 30 144 29 4 26 11 22 8 10 9 162 92
##
## 46 47 48 49 50 51 52 53 55 57 58 60 62 63 64 65 66 67
## FALSE 169 66 64 54 83 144 120 69 69 83 89 248 79 88 243 162 102 87
## TRUE 8 4 3 0 18 110 15 2 18 10 68 172 4 21 6 171 13 14
##
## 72 74 75 76 77 92
## FALSE 86 92 145 72 58 59
## TRUE 56 0 150 31 16 2
ggplot(tracking_data, aes(x = IsNonlinearNavigation, group = UserId, fill = UserId)) +
geom_bar(position = position_dodge(), stat = "count") +
theme_classic()
Indeed, the plot indicates that some participants use nonlinearity quite often whereas others use it very rarely.
Calculate (2) Linearity:
tracking_data$Linearity <-
ifelse(
tracking_data$IsRegression,
"Regression",
ifelse(
tracking_data$IsForwardLeap,
"ForwardLeap",
"LinearOrNonNavigation"
)
)
table(tracking_data$Linearity)
##
## ForwardLeap LinearOrNonNavigation Regression
## 687 6424 1663
Of the 2350 nonlinear navigation events, 1663 are regressions (70.77%), and the remaining 687 are forward leaps (29.23%).
Usage of the different linearity types again varies between participants:
table(tracking_data$Linearity, tracking_data$UserId)
##
## 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## ForwardLeap 5 44 0 7 1 5 93 20 0 0 0 1 7 0
## LinearOrNonNavigation 74 128 122 56 75 90 170 116 433 73 74 118 90 60
## Regression 2 92 8 14 5 12 136 43 28 20 11 19 10 13
##
## 20 21 22 23 24 28 29 30 31 32 34 35 36 37
## ForwardLeap 1 0 13 0 0 30 2 0 5 0 5 66 5 0
## LinearOrNonNavigation 78 160 112 44 91 171 131 47 52 143 102 103 93 67
## Regression 17 8 43 3 2 116 18 0 18 19 25 78 24 4
##
## 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## ForwardLeap 8 0 1 0 0 1 72 39 1 0 0 0 3 28
## LinearOrNonNavigation 123 136 94 69 81 113 133 71 169 66 64 54 83 144
## Regression 18 11 21 8 10 8 90 53 7 4 3 0 15 82
##
## 52 53 55 57 58 60 62 63 64 65 66 67 72 74
## ForwardLeap 1 0 0 0 15 23 0 4 0 88 0 0 7 0
## LinearOrNonNavigation 120 69 69 83 89 248 79 88 243 162 102 87 86 92
## Regression 14 2 18 10 53 149 4 17 6 83 13 14 49 0
##
## 75 76 77 92
## ForwardLeap 74 12 0 0
## LinearOrNonNavigation 145 72 58 59
## Regression 76 19 16 2
Note that the amount of events is connected to participants’ device size, and so users’ event counts are not directly comparable.
To calculate (3) StartsNonlinearity we first order tracking_data and create tests:
# order the df by User, date, time, and Id
tracking_data <-
tracking_data[
with(
tracking_data,
order(UserId, Date, Time, Id)
),
]
tracking_data$IsNewUser <-
(
tracking_data$UserId !=
(lag(tracking_data$UserId, 1))
)
tracking_data$IsNewSession <-
(
tracking_data$ReadingSessionNumber !=
(lag(tracking_data$ReadingSessionNumber, 1))
)
tracking_data$IsReadingSetUpChange <-
(
tracking_data$TotalPagesInSection !=
(lag(tracking_data$TotalPagesInSection, 1))
)
# first event is TRUE for all 3 tests
tracking_data[1, "IsNewUser"] <- TRUE
tracking_data[1, "IsNewSession"] <- TRUE
tracking_data[1, "IsReadingSetUpChange"] <- TRUE
Then, use tests in calculating StartsNonlinearity:
## Find events that initiate nonlinearity
for (row in 1:nrow(tracking_data)) {
if (row == 1 |
tracking_data[row, "IsNewUser"] |
tracking_data[row, "IsNewSession"] |
tracking_data[row, "IsReadingSetUpChange"]) {
# first row, new user, new session, or new set up
if (tracking_data[row, "IsNonlinearNavigation"]) {
# nonlinear navigation
tracking_data[row, "StartsNonlinearity"] <-
TRUE
} else {
# not nonlinear
tracking_data[row, "StartsNonlinearity"] <-
FALSE
}
} else {
# same user, reading session and reading set up
if ((tracking_data[row - 1, "Linearity"] !=
tracking_data[row, "Linearity"]) &
tracking_data[row, "IsNonlinearNavigation"]) {
# linearity type changes between previous and current event
## and current event is nonlinear
tracking_data[row, "StartsNonlinearity"] <-
TRUE
} else {
# the nonlinear event doesn't start nonlinearity
tracking_data[row, "StartsNonlinearity"] <-
FALSE
}
}
}
table(tracking_data$StartsNonlinearity)
##
## FALSE TRUE
## 8250 524
Out of all events, 524 iniate nonlinearity (5.97%).
table(tracking_data$StartsNonlinearity, tracking_data$UserId)
##
## 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## FALSE 78 246 126 75 77 102 369 161 449 85 80 129 104 68 88 165 154 46
## TRUE 3 18 4 2 4 5 30 18 12 8 5 9 3 5 8 3 14 1
##
## 24 28 29 30 31 32 34 35 36 37 38 39 40 41 42 43 44 45
## FALSE 92 274 139 47 69 153 123 240 112 69 141 143 104 74 86 116 288 153
## TRUE 1 43 12 0 6 9 9 7 10 2 8 4 12 3 5 6 7 10
##
## 46 47 48 49 50 51 52 53 55 57 58 60 62 63 64 65 66 67
## FALSE 169 68 65 54 92 219 126 70 78 87 146 401 81 101 245 308 110 95
## TRUE 8 2 2 0 9 35 9 1 9 6 11 19 2 8 4 25 5 6
##
## 72 74 75 76 77 92
## FALSE 124 92 273 94 67 60
## TRUE 18 0 22 9 7 1
StartsNonlinearity also varies between participants. This measure is not connected to participants’ number of events and so we can compare it between participants.
Create a summary dataframe to compare participants in their linearity:
participant_navigation_counts <- tracking_data %>%
group_by(UserId) %>%
summarise(
StartsNonlinearityCount = sum(StartsNonlinearity),
IsNonlinearNavigationCount = sum(IsNonlinearNavigation),
IsRegressionCount = sum(IsRegression),
IsForwardLeapCount = sum(IsForwardLeap),
EventCount = n()
)
StaticPlot <- ggplot(
participant_navigation_counts,
aes(x = IsNonlinearNavigationCount, y = StartsNonlinearityCount, colour = UserId)
) +
geom_point() +
theme_classic()
ggplotly(StaticPlot)
IsNonlinearNavigation and StartsNonlinearity are strongly correlated when there are only few nonlinear navigation events. This indicates that the nonlinear events are likely to be separate from each other, instead of being used for a longer period at once (consecutive nonlinearity or not). For example, participant id 55 has 18 nonlinear navigation events and 9 of them initiate nonlinearity. This indicates that on average, participant id 55 used two nonlinear navigation events concurrently (18/9=2).
However, this connection is less apparent for the participants who have more nonlinear events. For example, participant id 7 has 136 nonlinear navigation events but only 18 initiating events, indicating that on average, the participant used 8 nonlinear events after one initiating nonlinear event. In contrast, participant id 51 used nonlinear navigation 110 times and initiated nonlinearity 35 times, making their average count of nonlinear events per an initiated nonlinearity 3.
Finally, to create a measure for linearity to use in the analysis, we filter the dataframe to only include one event per page-view. We therefore remove events that occur outside of visible page-views, such as disengagements and dialog events (triggered by viewing of information sheet). We then select one event per page-view that includes information on linearity (in particular, “StartsNonlinearity”)
# Remove events that do not occur on a page-view
## 8144 rows (full data 8774 rows)
linearity_measure_data <- tracking_data %>%
filter(
Engagement != "Disengagement" &
Engagement != "Dialog"
)
# make sure the df is correctly ordered
tracking_data <-
tracking_data[
with(
tracking_data,
order(UserId, Date, Time, Id)
),
]
# Select one event per each page-view:
## group events by
### UserId, ReadingSessionNumber, ReadingBlockNumber (page-view indicator), and Condition
#### and summarise other important variables into columns
linearity_measure_data <- linearity_measure_data %>%
group_by(UserId, StoryId, ReadingSessionNumber, ReadingBlockNumber, Condition) %>%
summarise(
StartLocation = first(StartLocation),
EndLocation = first(EndLocation),
IsNewUser = any(IsNewUser),
IsNewSession = any(IsNewSession),
IsReadingSetUpChange = any(IsReadingSetUpChange),
IncludesNonlinearity = any(IsNonlinearNavigation),
StartsNonlinearity = any(StartsNonlinearity),
FirstTimeUntilDeadlineDays = first(TimeBeforeDeadlinesDays),
FirstCumulativeRSTime = first(CumulativeRSTime),
WindowWidth = first(WindowWidth)
)
## `summarise()` has grouped output by 'UserId', 'StoryId',
## 'ReadingSessionNumber', 'ReadingBlockNumber'. You can override using the
## `.groups` argument.
The new df is saved for usage in analysis. The dataset has already been saved, and so the below sr code chunk is not run.
# write.csv2(
# linearity_measure_data,
# "linearity_measure_data.csv"
# )